Box streaming

In this notebook, we will explore other capabilities of the hvPlot ecosystem.

In particular, we will look at the hist() and BoxDraw stream methods, using the AIMS eReefs database.

See also

For other inspirational functionalities, you might be interested in the examples from the HoloViews and GeoViews gallery.

Load the required Python libraries

First of all, load the necessary libraries. These are the ones we discussed previously:

  • numpy

  • matplotlib

  • cartopy

  • pandas

  • xarray

  • holoviews

  • geoviews

import os
import numpy as np
import pandas as pd
import xarray as xr

import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
cartopy.config['data_dir'] = os.getenv('CARTOPY_DIR', cartopy.config.get('data_dir'))

import cmocean
import hvplot.xarray

import holoviews as hv
from holoviews import opts, dim

import geoviews as gv
import geoviews.feature as gf
from geoviews import tile_sources as gvts

from cartopy import crs

gv.extension('bokeh')

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 

Build a multi-file dataset

We will use the open_mfdataset function from xArray to open multiple netCDF files into a single xarray Dataset.

We will query load the GBR4km dataset from the AIMS server, so let’s first define the base URL:

base_url = "http://thredds.ereefs.aims.gov.au/thredds/dodsC/s3://aims-ereefs-public-prod/derived/ncaggregate/ereefs/gbr4_v2/daily-monthly/EREEFS_AIMS-CSIRO_gbr4_v2_hydro_daily-monthly-"

For the sake of the demonstration, we will only use 2 months:

month_st = 3   # Starting month (March)
month_ed = 4   # Ending month (April)
year = 2018    # Year

# Based on the server the file naming convention 
hydrofiles = [f"{base_url}{year}-{month:02}.nc" for month in range(month_st, month_ed+1)]
hydrofiles
['http://thredds.ereefs.aims.gov.au/thredds/dodsC/s3://aims-ereefs-public-prod/derived/ncaggregate/ereefs/gbr4_v2/daily-monthly/EREEFS_AIMS-CSIRO_gbr4_v2_hydro_daily-monthly-2018-03.nc',
 'http://thredds.ereefs.aims.gov.au/thredds/dodsC/s3://aims-ereefs-public-prod/derived/ncaggregate/ereefs/gbr4_v2/daily-monthly/EREEFS_AIMS-CSIRO_gbr4_v2_hydro_daily-monthly-2018-04.nc']

Loading the dataset into xArray

Using xArray, we open these files into a Dataset:

ds_hydro = xr.open_mfdataset(hydrofiles)
ds_hydro
<xarray.Dataset>
Dimensions:      (k: 17, latitude: 723, longitude: 491, time: 61)
Coordinates:
  * time         (time) datetime64[ns] 2018-02-28T14:00:00 ... 2018-04-29T14:...
    zc           (k) float64 dask.array<chunksize=(17,), meta=np.ndarray>
  * latitude     (latitude) float64 -28.7 -28.67 -28.64 ... -7.096 -7.066 -7.036
  * longitude    (longitude) float64 142.2 142.2 142.2 ... 156.8 156.8 156.9
Dimensions without coordinates: k
Data variables:
    mean_cur     (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 723, 491), meta=np.ndarray>
    salt         (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 723, 491), meta=np.ndarray>
    temp         (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 723, 491), meta=np.ndarray>
    u            (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 723, 491), meta=np.ndarray>
    v            (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 723, 491), meta=np.ndarray>
    mean_wspeed  (time, latitude, longitude) float32 dask.array<chunksize=(31, 723, 491), meta=np.ndarray>
    eta          (time, latitude, longitude) float32 dask.array<chunksize=(31, 723, 491), meta=np.ndarray>
    wspeed_u     (time, latitude, longitude) float32 dask.array<chunksize=(31, 723, 491), meta=np.ndarray>
    wspeed_v     (time, latitude, longitude) float32 dask.array<chunksize=(31, 723, 491), meta=np.ndarray>
Attributes: (12/21)
    Conventions:                     CF-1.0
    NCO:                             4.4.4
    Run_ID:                          2
    _CoordSysBuilder:                ucar.nc2.dataset.conv.CF1Convention
    aims_ncaggregate_buildDate:      2020-08-21T14:27:56+10:00
    aims_ncaggregate_datasetId:      products__ncaggregate__ereefs__gbr4_v2__...
    ...                              ...
    paramhead:                       GBR 4km resolution grid
    shoc_version:                    v1.1 rev(5620)
    technical_guide_link:            https://eatlas.org.au/pydio/public/aims-...
    technical_guide_publish_date:    2020-08-18
    title:                           eReefs AIMS-CSIRO GBR4 Hydrodynamic v2 d...
    DODS_EXTRA.Unlimited_Dimension:  time

Clip the Dataset

To reduce the Dataset size we will clip the spatial extent based on longitudinal and latitudinal values.

As we already saw this is easely done using the sel function with the slice method.

min_lon = 146.5   # lower left longitude
min_lat = -20     # lower left latitude
max_lon = 148     # upper right longitude
max_lat = -17     # upper right latitude

# Defining the boundaries
lon_bnds = [min_lon, max_lon]
lat_bnds = [min_lat, max_lat]

# Performing the reduction 
ds_hydro_clip = ds_hydro.sel(latitude=slice(*lat_bnds), longitude=slice(*lon_bnds))
ds_hydro_clip
<xarray.Dataset>
Dimensions:      (k: 17, latitude: 100, longitude: 50, time: 61)
Coordinates:
  * time         (time) datetime64[ns] 2018-02-28T14:00:00 ... 2018-04-29T14:...
    zc           (k) float64 dask.array<chunksize=(17,), meta=np.ndarray>
  * latitude     (latitude) float64 -20.0 -19.97 -19.94 ... -17.09 -17.06 -17.03
  * longitude    (longitude) float64 146.5 146.5 146.6 ... 147.9 148.0 148.0
Dimensions without coordinates: k
Data variables:
    mean_cur     (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    salt         (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    temp         (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    u            (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    v            (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    mean_wspeed  (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
    eta          (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
    wspeed_u     (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
    wspeed_v     (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
Attributes: (12/21)
    Conventions:                     CF-1.0
    NCO:                             4.4.4
    Run_ID:                          2
    _CoordSysBuilder:                ucar.nc2.dataset.conv.CF1Convention
    aims_ncaggregate_buildDate:      2020-08-21T14:27:56+10:00
    aims_ncaggregate_datasetId:      products__ncaggregate__ereefs__gbr4_v2__...
    ...                              ...
    paramhead:                       GBR 4km resolution grid
    shoc_version:                    v1.1 rev(5620)
    technical_guide_link:            https://eatlas.org.au/pydio/public/aims-...
    technical_guide_publish_date:    2020-08-18
    title:                           eReefs AIMS-CSIRO GBR4 Hydrodynamic v2 d...
    DODS_EXTRA.Unlimited_Dimension:  time

Change coordinate name

To make it easier to process the xArray dataset, we change the zc coordinate to the same name as its dimension (i.e. k).

This is done like this:

# Creation of a new coordinate k with the same array as zc
ds_hydro_clip.coords['k'] = ('zc',ds_hydro_clip.zc)

# Swapping `zc` with `k`
ds_hydro_clip = ds_hydro_clip.swap_dims({'zc':'k'})

# Ok we can now safely remove `zc`
ds_hydro_clip = ds_hydro_clip.drop(['zc'])
ds_hydro_clip
<xarray.Dataset>
Dimensions:      (k: 17, latitude: 100, longitude: 50, time: 61)
Coordinates:
  * time         (time) datetime64[ns] 2018-02-28T14:00:00 ... 2018-04-29T14:...
  * latitude     (latitude) float64 -20.0 -19.97 -19.94 ... -17.09 -17.06 -17.03
  * longitude    (longitude) float64 146.5 146.5 146.6 ... 147.9 148.0 148.0
  * k            (k) float64 -145.0 -120.0 -103.0 -88.0 ... -5.55 -3.0 -1.5 -0.5
Data variables:
    mean_cur     (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    salt         (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    temp         (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    u            (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    v            (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 17, 100, 50), meta=np.ndarray>
    mean_wspeed  (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
    eta          (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
    wspeed_u     (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
    wspeed_v     (time, latitude, longitude) float32 dask.array<chunksize=(31, 100, 50), meta=np.ndarray>
Attributes: (12/21)
    Conventions:                     CF-1.0
    NCO:                             4.4.4
    Run_ID:                          2
    _CoordSysBuilder:                ucar.nc2.dataset.conv.CF1Convention
    aims_ncaggregate_buildDate:      2020-08-21T14:27:56+10:00
    aims_ncaggregate_datasetId:      products__ncaggregate__ereefs__gbr4_v2__...
    ...                              ...
    paramhead:                       GBR 4km resolution grid
    shoc_version:                    v1.1 rev(5620)
    technical_guide_link:            https://eatlas.org.au/pydio/public/aims-...
    technical_guide_publish_date:    2020-08-18
    title:                           eReefs AIMS-CSIRO GBR4 Hydrodynamic v2 d...
    DODS_EXTRA.Unlimited_Dimension:  time

Here we want to evaluate the surface salinity evolution over the temporal interval.

To proceed, we will only select the top 10 m of the dataset based on the z-coordinates definition.

# Take the top 10 m based on the z-coordinate (k dimension)
slice_ds = ds_hydro_clip.sel(k=slice(-10, 0)).drop(['u','v','eta',
                                                    'mean_wspeed',
                                                    'wspeed_u',
                                                    'wspeed_v'])
slice_ds
<xarray.Dataset>
Dimensions:    (k: 5, latitude: 100, longitude: 50, time: 61)
Coordinates:
  * time       (time) datetime64[ns] 2018-02-28T14:00:00 ... 2018-04-29T14:00:00
  * latitude   (latitude) float64 -20.0 -19.97 -19.94 ... -17.09 -17.06 -17.03
  * longitude  (longitude) float64 146.5 146.5 146.6 146.6 ... 147.9 148.0 148.0
  * k          (k) float64 -8.8 -5.55 -3.0 -1.5 -0.5
Data variables:
    mean_cur   (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 5, 100, 50), meta=np.ndarray>
    salt       (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 5, 100, 50), meta=np.ndarray>
    temp       (time, k, latitude, longitude) float32 dask.array<chunksize=(31, 5, 100, 50), meta=np.ndarray>
Attributes: (12/21)
    Conventions:                     CF-1.0
    NCO:                             4.4.4
    Run_ID:                          2
    _CoordSysBuilder:                ucar.nc2.dataset.conv.CF1Convention
    aims_ncaggregate_buildDate:      2020-08-21T14:27:56+10:00
    aims_ncaggregate_datasetId:      products__ncaggregate__ereefs__gbr4_v2__...
    ...                              ...
    paramhead:                       GBR 4km resolution grid
    shoc_version:                    v1.1 rev(5620)
    technical_guide_link:            https://eatlas.org.au/pydio/public/aims-...
    technical_guide_publish_date:    2020-08-18
    title:                           eReefs AIMS-CSIRO GBR4 Hydrodynamic v2 d...
    DODS_EXTRA.Unlimited_Dimension:  time

Now we will average the value along the k coordinate using the mean function and then load the Xarray Dataset in memory:

vards = slice_ds.mean(dim='k').salt.load()

Visualise temporal changes

We first load data using both Cartopy and GeoViews to check what we have…

# Loading coastlines from Cartopy
coastline = gf.coastline(line_width=3, line_color='k').opts(projection=ccrs.PlateCarree(), scale='10m')

# Loading land mask from Cartopy
land = gf.land.options(scale='10m', fill_color='lightgray')

# Let's put the Xarray dataset into a GeoViews object
var_mask = gv.Dataset(vards, crs=crs.PlateCarree())

# Print salinity range:
minvar = vards.min().item()
maxvar = vards.max().item()
print('Salinity range: ',minvar,maxvar)
# var0 = vards.fillna(0)
# hv_ds = hv.Dataset(var0)

# Create stack of images grouped by time
im_mask = var_mask.to(gv.Image, ['longitude', 'latitude'], dynamic=True)
Salinity range:  6.842176914215088 35.55593490600586

GeoViews interactive plot…

# Positioning the slider at the bottom
hv.output(widget_location='bottom')

# Figure title name
titleName = slice_ds.salt.long_name+' '+slice_ds.salt.units

# Plotting the interactive view
im_mask.opts(active_tools=['wheel_zoom', 'pan'], cmap=cmocean.cm.curl,
             colorbar=True, width=450, height=400, clim=(34,36),
             title=titleName) * coastline * land
/usr/share/miniconda/envs/envireef/lib/python3.8/site-packages/cartopy/io/__init__.py:260: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/physical/ne_110m_land.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)

By sliding over time, we can quickly visualise the average surface salinity…

Histogram plot

The first thing we will do is customize the appearance of the elements used in the rest of the notebook, using opts.defaults to declare the options we want to use ahead of time (see the User Guide for details).

opts.defaults(
    opts.GridSpace(shared_xaxis=True, shared_yaxis=True),
    opts.Image(cmap=cmocean.cm.curl, clim=(34,36)),
    opts.Labels(text_color='white', text_font_size='8pt', text_align='left', text_baseline='bottom'),
    opts.Path(color='white'),
    opts.Spread(width=600),
    opts.VLine(color='black'),
    opts.Curve(width=400, framewise=True), 
    opts.Polygons(fill_alpha=0.2, line_color='white'),
    opts.Overlay(show_legend=False))

Using the .to interface as above, we can map the dimensions of our Dataset onto the dimensions of an Element.

To display an image, we will pick the Image element and specify the longitude and latitude as the two key dimensions of each Image. Since our dataset has only a single value dimension (salt), we don’t need to declare it explicitly:

im_hist = var_mask.to(gv.Image, 
                      ['longitude', 'latitude']).hist().opts(width=500, height=450, 
                                                             title=titleName)
im_hist

As usual, the unspecified key dimension time has become a slider widget.

The image can be saved as a html file by uncomenting the cell below:

# hv.save(im_hist, 'hist.html')

See also

To see the above example in full interactivity, you might want to use the following html link

Box stream

We will use the BoxDraw stream to draw region of interests (e.g. ROIs) over a set of surface salinity data, and use them to compute and display timeseries of the activity in the regions of interests.

Before going further we will work the Xarray dataset a bit…

First, we will change the latitude and longitude to integer as the Box stream does not seem to accept floating values (or at least I didn;t find the trick yet).

slice_ds.coords['y'] = ('latitude',slice_ds.latitude)
slice_ds.coords['x'] = ('longitude',slice_ds.longitude)
slice_ds2 = slice_ds.swap_dims({'latitude':'y'})
slice_ds2 = slice_ds2.swap_dims({'longitude':'x'})
slice_ds2 = slice_ds2.drop(['latitude'])
slice_ds2 = slice_ds2.drop(['longitude'])
slice_ds2 =  slice_ds2.assign_coords( x=(np.arange(slice_ds.x.shape[0])),
                                 y=(np.arange(slice_ds.y.shape[0])),
                                 ) 
slice_ds2
<xarray.Dataset>
Dimensions:   (k: 5, time: 61, x: 50, y: 100)
Coordinates:
  * time      (time) datetime64[ns] 2018-02-28T14:00:00 ... 2018-04-29T14:00:00
  * k         (k) float64 -8.8 -5.55 -3.0 -1.5 -0.5
  * y         (y) int64 0 1 2 3 4 5 6 7 8 9 10 ... 90 91 92 93 94 95 96 97 98 99
  * x         (x) int64 0 1 2 3 4 5 6 7 8 9 10 ... 40 41 42 43 44 45 46 47 48 49
Data variables:
    mean_cur  (time, k, y, x) float32 dask.array<chunksize=(31, 5, 100, 50), meta=np.ndarray>
    salt      (time, k, y, x) float32 dask.array<chunksize=(31, 5, 100, 50), meta=np.ndarray>
    temp      (time, k, y, x) float32 dask.array<chunksize=(31, 5, 100, 50), meta=np.ndarray>
Attributes: (12/21)
    Conventions:                     CF-1.0
    NCO:                             4.4.4
    Run_ID:                          2
    _CoordSysBuilder:                ucar.nc2.dataset.conv.CF1Convention
    aims_ncaggregate_buildDate:      2020-08-21T14:27:56+10:00
    aims_ncaggregate_datasetId:      products__ncaggregate__ereefs__gbr4_v2__...
    ...                              ...
    paramhead:                       GBR 4km resolution grid
    shoc_version:                    v1.1 rev(5620)
    technical_guide_link:            https://eatlas.org.au/pydio/public/aims-...
    technical_guide_publish_date:    2020-08-18
    title:                           eReefs AIMS-CSIRO GBR4 Hydrodynamic v2 d...
    DODS_EXTRA.Unlimited_Dimension:  time
saltXY = slice_ds2.mean(dim='k').salt.load()

The following is from the HoloViews example and define the ROIs functionality:

hv_ds = hv.Dataset(saltXY)

# Create stack of images grouped by time
im = hv_ds.to(hv.Image, ['x','y'], dynamic=True).opts(active_tools=['wheel_zoom', 'pan'], cmap=cmocean.cm.curl,
                     colorbar=True, width=450, height=400, clim=(34,36))

polys = hv.Polygons([])

box_stream = hv.streams.BoxEdit(source=polys)

# Declare an empty DataFrame to declare the types
empty = pd.DataFrame({'time': np.array([], dtype='datetime64[ns]'), 'salt': []})

def roi_curves(data):
    if not data or not any(len(d) for d in data.values()):
        return hv.NdOverlay({0: hv.Curve(empty, 'time', 'salt')})

    curves = {}
    data = zip(data['x0'], data['x1'], data['y0'], data['y1'])
    for i, (x0, x1, y0, y1) in enumerate(data):
        selection = hv_ds.select(x=(x0, x1), y=(y0, y1))
        curves[i] = hv.Curve(selection.aggregate('time', np.mean))
    return hv.NdOverlay(curves)

# Generate VLines by getting time value from the image frames
def vline(frame):
    return hv.VLine(frame.data.time.values)
vlines = im.apply(vline)

dmap = hv.DynamicMap(roi_curves, streams=[box_stream])

Important

To define an ROI, select the Box edit tool and double click to start defining the ROI and double click to finish placing the ROI:

(im * polys + dmap * vlines ).opts(title=titleName)